A measure of phonetic similarity to quantify pronunciation variation by using ASR technology

نویسندگان

  • Tianze Shi
  • Shun Kasahara
  • Teeraphon Pongkittiphan
  • Nobuaki Minematsu
  • Daisuke Saito
  • Keikichi Hirose
چکیده

It attracts researchers’ interest how to define a quantitative measure of phonetic similarity between IPA transcripts of the same sentence read by two speakers. This problem can be divided into how to align two transcripts and how to quantify alignment gap. In this paper, we introduce a method of similarity calculation using phone-based or phoneme-based acoustic models trained with the algorithm used to develop Automatic Speech Recognition (ASR) systems. Use of acoustic models will introduce an issue of speaker dependency because speech spectrums always convey the information of the training speakers’ age and gender, which is totally irrelevant to phonetic similarity calculation. We examine how independent our method is of training speakers and how close the calculated similarity is to the similarity subjectively rated through a listening test. We also compare our method to recent works and show our method can give higher correlation by 4 points to human-rated similarity.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using an underspecified ASR system as an indicator for phonetic similarity

This paper presents a novel approach to the identification of phonetic similarity using properties observed during the speech recognition process. An experiment is presented whereby specific phones are removed during the training phase of a statistical speech recognition system so that the behaviour of the system can be analysed to see which alternative phone is selected. The domain of the anal...

متن کامل

Using accent information in ASR models for Swedish

A common technique to cope with the large variability in the acoustic realisations of the phonetic classes in speech, is to partition the data according to a linguistically significant variable. In this work, accent dependent phonetic models were trained and used both as an analysis tool for pronunciation variation and in the attempt to improve ASR performance. The Idea Accent dependent trainin...

متن کامل

A study of implicit and explicit modeling of coarticulation and pronunciation variation

In this paper, we focus on the modeling of coarticulation and pronunciation variation in Automatic Speech Recognition systems (ASR). Most ASR systems explicitly describe these production phenomena through context-dependent phoneme models and multiple pronunciation lexicons. Here, we explore the potential benefit of using feature spaces covering longer time segments in terms of implicit modeling...

متن کامل

Underspecified Feature Models for Pronunciation Variation in Asr

In the 1990s, several studies showed that if we could just predict correctly when to include alternate pronunciations of words in ASR lexica, we could greatly reduce error rates for conversational speech tasks (i.e., Switchboard). But it is clear that the field has thus far failed to reach that potential. Many scholars model pronunciation variation via a substitution of one phonetic sequence fo...

متن کامل

Incorporating Contextual Phonetics into Automatic Speech Recognition

This work outlines the problems encountered in modeling pronunciation for automatic speech recognition (ASR) of spontaneous (American) English speech. We detail some of the phonetic phenomena within the Switchboard corpus that make the recognition of this speaking style difficult. Phonetic transcribers found that feature spreading and cue trading made identification of phonetic segmental bounda...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015